Skip to content

Conversation

@kwvg
Copy link
Collaborator

@kwvg kwvg commented Aug 2, 2025

Additional Information

  • sphlib requires platforms to be SPH_64 in order to build variants like Blake512 (source) and Bmw512 (source). While other variants have alternate implementations for non-SPH_64 platforms like Groestl (source), non-SPH_64 platforms cannot build Blake512 or Bmw512.

    As X11 (and by extension, Dash) requires both, we can safely assume that Dash Core doesn't support non-SPH_64 targets and can remove fallback code.

  • To inform decisions when optimizing X11, this pull request introduces benchmarks for all constituent hash algorithms across the same set of data sizes (32b, 80b, 128b, 512b, 1024b, 2048b and 1M), all tests can be run with bench_dash --filter="Pow(.*)" and can be filtered by data size with bench_dash --filter="Pow(.*)1024b(.*)" (see below)

    Benchmarks:
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |                1.60 |      625,125,455.46 |    0.3% |           25.60 |            7.10 |  3.603 |           0.13 |    0.0% |      0.01 | `Pow_Blake512_1024b`
    |                0.75 |    1,333,435,941.89 |    0.2% |           17.06 |            3.33 |  5.124 |           0.12 |    0.0% |      0.01 | `Pow_Bmw512_1024b`
    |                6.24 |      160,132,129.57 |    0.0% |          118.36 |           27.74 |  4.267 |           0.25 |    0.4% |      0.01 | `Pow_Cubehash512_1024b`
    |               14.64 |       68,308,839.95 |    0.0% |          261.17 |           65.02 |  4.017 |           1.63 |    0.6% |      0.01 | `Pow_Echo512_1024b`
    |                8.72 |      114,678,066.51 |    0.1% |          169.16 |           38.73 |  4.368 |           0.25 |    0.4% |      0.01 | `Pow_Groestl512_1024b`
    |                8.87 |      112,774,410.39 |    0.1% |          153.89 |           39.38 |  3.907 |           0.15 |    0.6% |      0.01 | `Pow_Jh512_1024b`
    |                3.79 |      264,078,320.88 |    0.1% |           81.14 |           16.81 |  4.826 |           0.23 |    0.0% |      0.01 | `Pow_Keccak512_1024b`
    |                6.87 |      145,603,041.03 |    0.1% |          131.55 |           30.46 |  4.319 |           1.11 |    0.1% |      0.01 | `Pow_Luffa512_1024b`
    |                7.54 |      132,578,659.24 |    0.1% |          119.56 |           33.48 |  3.572 |           0.26 |    0.4% |      0.01 | `Pow_Shavite512_1024b`
    |                7.24 |      138,072,734.74 |    0.5% |          129.58 |           32.09 |  4.037 |           2.38 |    0.0% |      0.01 | `Pow_Simd512_1024b`
    |                1.17 |      857,609,484.77 |    0.0% |           20.67 |            5.18 |  3.992 |           0.15 |    0.0% |      0.01 | `Pow_Skein512_1024b`
    |               11.45 |       87,351,960.89 |    0.0% |          195.23 |           50.85 |  3.840 |           1.30 |    0.1% |      0.01 | `Pow_X11_1024b`
    
  • This pull request also makes changes that result in performance impact, the baseline measurement is given below

    Build parameters: LDFLAGS="-Wl,--as-needed -Wl,-O2" CC=clang-16 CXX=clang++-16 ./configure --prefix=$(pwd)/depends/x86_64-linux-gnu --enable-reduce-exports --disable-tests --disable-gui-tests --disable-fuzz --disable-fuzz-binary --disable-ccache --disable-maintainer-mode --disable-dependency-tracking --without-gui

    Baseline benchmarks (489c5f0):
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |              321.49 |        3,110,502.97 |    0.1% |        5,527.82 |        1,427.78 |  3.872 |          38.97 |    0.1% |      0.01 | `X11_0032b_single`
    |              128.63 |        7,774,397.65 |    0.1% |        2,211.03 |          571.22 |  3.871 |          15.56 |    0.1% |      0.01 | `X11_0080b_single`
    |               81.97 |       12,199,203.11 |    0.1% |        1,404.79 |          364.00 |  3.859 |           9.80 |    0.1% |      0.01 | `X11_0128b_single`
    |               21.55 |       46,394,313.29 |    0.1% |          368.03 |           95.73 |  3.844 |           2.51 |    0.1% |      0.01 | `X11_0512b_single`
    |               11.48 |       87,085,396.93 |    0.1% |          195.23 |           51.00 |  3.828 |           1.30 |    0.1% |      0.01 | `X11_1024b_single`
    |                1.43 |      700,791,473.89 |    0.1% |           22.61 |            6.33 |  3.572 |           0.09 |    0.0% |      0.02 | `X11_1M`
    |                6.45 |      155,097,633.83 |    0.0% |          108.83 |           28.64 |  3.801 |           0.69 |    0.1% |      0.01 | `X11_2048b_single`
    

    Enablement of small footprint for JH, CubeHash (~10% improvement)

    Benchmarks (30bf19d):
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |              291.68 |        3,428,360.28 |    0.1% |        5,135.01 |        1,294.73 |  3.966 |          41.91 |    0.1% |      0.01 | `Pow_X11_0032b`
    |              117.46 |        8,513,845.07 |    0.1% |        2,053.90 |          518.66 |  3.960 |          16.74 |    0.1% |      0.01 | `Pow_X11_0080b`
    |               75.22 |       13,293,803.11 |    0.1% |        1,305.23 |          332.11 |  3.930 |          10.53 |    0.1% |      0.01 | `Pow_X11_0128b`
    |               19.65 |       50,880,118.90 |    0.0% |          342.12 |           86.79 |  3.942 |           2.70 |    0.1% |      0.01 | `Pow_X11_0512b`
    |               10.46 |       95,567,763.92 |    0.1% |          181.60 |           46.21 |  3.930 |           1.39 |    0.1% |      0.01 | `Pow_X11_1024b`
    |                1.20 |      835,966,628.21 |    0.4% |           21.24 |            5.28 |  4.023 |           0.09 |    0.0% |      0.01 | `Pow_X11_1M`
    |                5.83 |      171,499,428.31 |    0.2% |          101.34 |           25.75 |  3.936 |           0.74 |    0.1% |      0.01 | `Pow_X11_2048b`
    

Breaking Changes

None expected.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation (note: N/A)
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@kwvg kwvg added this to the 23 milestone Aug 2, 2025
@github-actions
Copy link

github-actions bot commented Aug 2, 2025

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@kwvg kwvg changed the title crypto: assume true 64-bit target, add benchmarks for constituent hashes, implement AES-NI backend and dispatcher for Echo512 and Shavite512 crypto: assume true 64-bit target, add benchmarks for constituent hashes, apply stronger optimizations Aug 12, 2025
@kwvg kwvg marked this pull request as ready for review August 13, 2025 20:07
@coderabbitai
Copy link

coderabbitai bot commented Aug 13, 2025

Walkthrough

Adds and exports SPHLIB_CFLAGS via configure.ac and enables -O3 for non-debug sphlib builds. Introduces a new static SPH crypto library (crypto/libbitcoin_crypto_sph.la), wires its sources and flags into LIBBITCOIN_CRYPTO and libdashconsensus, and adds SPH-specific CPPFLAGS. Adds a new benchmark file (src/bench/pow_hash.cpp) and registers it; removes X11 benchmarks from src/bench/crypto_hash.cpp. Removes the JH-512 implementation file. Multiple sph_* headers and sph_types.h were refactored: void* contexts replaced by typed context pointers, extern "C" and SPH_64 conditionals simplified/removed, and types/macros modernized. Minor change in hash_x11.h pblank initialization.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/crypto/x11/sph_cubehash.h (1)

76-114: Action required: document vendored edits to src/crypto/x11/sph_cubehash.h and confirm C-linkage safety

Short verification summary:

  • I inspected src/crypto/x11/sph_cubehash.h (no extern "C" present).
  • No .c translation units include any sph_*.h headers; sph headers are only included by C++ TUs.
  • No legacy void* prototypes for the sph_*512 APIs were found.

Files to note:

  • src/crypto/x11/sph_cubehash.h (vendored header — changed linkage/signature surface)
  • Consumers found: src/crypto/x11/cubehash.cpp (implementation), src/hash_x11.h (uses ctx & calls), src/bench/pow_hash.cpp

Required actions (pick one):

  • Add a vendor/patch-notes entry documenting why this vendored file was edited and the intended vendor-sync strategy; OR
  • If editing vendor code is disallowed, revert the change and implement any C++-specific API surface outside the vendored tree; OR
  • If C translation units need to consume these APIs now or later, provide a non-vendor C-compatible wrapper (reintroducing extern "C") rather than modifying the vendored header.

Reason: edits touch a vendored path — even though current usage is C++-only, the repo policy and future C TU usage require explicit documentation or a safer wrapper approach.

♻️ Duplicate comments (4)
src/crypto/x11/sph_shavite.h (1)

99-116: close/addbits_and_close prototypes updated correctly

The change to sph_shavite512_context* aligns with implementation and documented reinit behavior post-close.

The same vendored-path and extern "C" considerations from src/crypto/x11/sph_cubehash.h apply here.

src/crypto/x11/sph_bmw.h (1)

97-114: Finalization prototypes updated to typed contexts

sph_bmw512_close and addbits_and_close now take sph_bmw512_context*, matching the implementation and other 512-bit APIs.

Vendored-path and extern "C" linkage notes from the CubeHash review apply here as well.

src/crypto/x11/sph_simd.h (1)

97-114: close and addbits_and_close updated consistently

The finalization functions now use sph_simd512_context*. Behavior matches implementation delegating from close->addbits_and_close.

Same vendor/linkage caveats as noted previously.

src/crypto/x11/sph_skein.h (1)

101-117: Finalization functions updated to typed context pointer

sph_skein512_close and addbits_and_close use sph_skein512_context*, aligning with implementation and other algorithms’ 512-bit API.

Vendor policy and extern "C" considerations from earlier comments apply here too.

🧹 Nitpick comments (9)
test/lint/lint-include-guards.py (1)

29-29: Broadened exclusion to src/crypto/x11: verify we’re not masking real issues

Excluding the entire directory will skip include-guard checks for all X11 headers. If this is temporary to accommodate in-flight refactors, consider tracking a follow-up to narrow the scope later or to fix guards and re-enable checks.

If you want to tighten semantics to true “prefix” matching (and avoid accidental substring matches), refactor the filter to use startswith() against normalized paths. I can draft a patch if desired.

test/lint/lint-whitespace.py (1)

21-21: Exclude src/crypto/x11 from whitespace lint: sanity-check necessity and duration

This aligns with the include-guard exclusion. Please confirm this is intended (e.g., to land a large vendor-ish import or transitional formatting) and plan to re-enable once stable, to keep lint coverage high.

src/bench/pow_hash.cpp (3)

47-47: Pass the hash buffer, not the array address.

You currently pass &hash (type uint8_t ()[64]) to the close functions. While it implicitly converts to void, it’s non-idiomatic and can be confusing. Pass hash (decays to uint8_t*) or &hash[0] for clarity.

Apply this diff to all close calls:

-        sph_blake512_close(&ctx, &hash);
+        sph_blake512_close(&ctx, hash);
-        sph_bmw512_close(&ctx, &hash);
+        sph_bmw512_close(&ctx, hash);
-        sph_cubehash512_close(&ctx, &hash);
+        sph_cubehash512_close(&ctx, hash);
-        sph_echo512_close(&ctx, &hash);
+        sph_echo512_close(&ctx, hash);
-        sph_groestl512_close(&ctx, &hash);
+        sph_groestl512_close(&ctx, hash);
-        sph_jh512_close(&ctx, &hash);
+        sph_jh512_close(&ctx, hash);
-        sph_keccak512_close(&ctx, &hash);
+        sph_keccak512_close(&ctx, hash);
-        sph_luffa512_close(&ctx, &hash);
+        sph_luffa512_close(&ctx, hash);
-        sph_shavite512_close(&ctx, &hash);
+        sph_shavite512_close(&ctx, hash);
-        sph_simd512_close(&ctx, &hash);
+        sph_simd512_close(&ctx, hash);
-        sph_skein512_close(&ctx, &hash);
+        sph_skein512_close(&ctx, hash);

Also applies to: 59-59, 71-71, 83-83, 95-95, 107-107, 119-119, 131-131, 143-143, 155-155, 167-167


39-169: Optional: reduce duplication with a small generic helper.

All Pow_*512 functions share the same pattern. A templated helper taking init/update/close callables (or function pointers) plus context type would reduce boilerplate and make future additions less error-prone, without affecting benchmark fidelity.


29-37: Optional: guard against over-optimization in microbench.

Writes to a local buffer via external functions are unlikely to be optimized away, but for belt-and-suspenders, consider calling ancl::nanobench::doNotOptimizeAway(hash) (or the bench helper, if exposed) after the run to pin the side effect.

src/crypto/x11/groestl.cpp (1)

2432-2459: Nit: [[maybe_unused]] on ‘buf’ while it’s used.

buf is assigned and subsequently used (for address derivation earlier). Either remove [[maybe_unused]] or drop the unused assignment entirely to avoid confusion. Given this is vendored, consider leaving as-is unless you’re already touching this file.

src/crypto/x11/jh.cpp (1)

1-1: Update the file header comment to reflect the C++ file.

The file identifier comment still references jh.c but this is now a C++ file (jh.cpp).

-/* $Id: jh.c 255 2011-06-07 19:50:20Z tp $ */
+/* $Id: jh.cpp 255 2011-06-07 19:50:20Z tp $ */
src/crypto/x11/aes_helper.h (1)

1-1: Update the file header comment to reflect the header file.

The file identifier comment references aes_helper.c but this is a header file (aes_helper.h).

-/* $Id: aes_helper.c 220 2010-06-09 09:21:50Z tp $ */
+/* $Id: aes_helper.h 220 2010-06-09 09:21:50Z tp $ */
src/crypto/x11/sph_types.h (1)

389-394: “Aligned” helpers no longer differ from unaligned implementations

The aligned variants currently call the same Write*/Read* routines as the unaligned versions (which use memcpy internally and don’t depend on alignment). That’s fine for correctness, but the docstrings still state alignment requirements. Consider updating the comments to avoid confusion.

Also applies to: 414-418, 440-444, 465-468, 490-494, 515-518, 540-544, 565-568

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 489c5f0 and 941948e.

📒 Files selected for processing (34)
  • configure.ac (2 hunks)
  • src/Makefile.am (3 hunks)
  • src/Makefile.bench.include (1 hunks)
  • src/bench/crypto_hash.cpp (0 hunks)
  • src/bench/pow_hash.cpp (1 hunks)
  • src/crypto/common.h (1 hunks)
  • src/crypto/x11/aes_helper.c (0 hunks)
  • src/crypto/x11/aes_helper.h (1 hunks)
  • src/crypto/x11/blake.cpp (2 hunks)
  • src/crypto/x11/bmw.cpp (1 hunks)
  • src/crypto/x11/cubehash.cpp (1 hunks)
  • src/crypto/x11/echo.cpp (2 hunks)
  • src/crypto/x11/groestl.cpp (2 hunks)
  • src/crypto/x11/jh.c (0 hunks)
  • src/crypto/x11/jh.cpp (1 hunks)
  • src/crypto/x11/keccak.cpp (1 hunks)
  • src/crypto/x11/luffa.cpp (7 hunks)
  • src/crypto/x11/shavite.cpp (2 hunks)
  • src/crypto/x11/simd.cpp (2 hunks)
  • src/crypto/x11/skein.cpp (1 hunks)
  • src/crypto/x11/sph_blake.h (4 hunks)
  • src/crypto/x11/sph_bmw.h (4 hunks)
  • src/crypto/x11/sph_cubehash.h (4 hunks)
  • src/crypto/x11/sph_echo.h (4 hunks)
  • src/crypto/x11/sph_groestl.h (4 hunks)
  • src/crypto/x11/sph_jh.h (4 hunks)
  • src/crypto/x11/sph_keccak.h (4 hunks)
  • src/crypto/x11/sph_luffa.h (4 hunks)
  • src/crypto/x11/sph_shavite.h (4 hunks)
  • src/crypto/x11/sph_simd.h (4 hunks)
  • src/crypto/x11/sph_skein.h (4 hunks)
  • src/crypto/x11/sph_types.h (19 hunks)
  • test/lint/lint-include-guards.py (1 hunks)
  • test/lint/lint-whitespace.py (1 hunks)
💤 Files with no reviewable changes (3)
  • src/bench/crypto_hash.cpp
  • src/crypto/x11/jh.c
  • src/crypto/x11/aes_helper.c
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/crypto/common.h
  • src/crypto/x11/sph_cubehash.h
  • src/crypto/x11/sph_simd.h
  • src/crypto/x11/sph_luffa.h
  • src/crypto/x11/sph_bmw.h
  • src/crypto/x11/sph_keccak.h
  • src/bench/pow_hash.cpp
  • src/crypto/x11/skein.cpp
  • src/crypto/x11/cubehash.cpp
  • src/crypto/x11/sph_shavite.h
  • src/crypto/x11/sph_skein.h
  • src/crypto/x11/jh.cpp
  • src/crypto/x11/aes_helper.h
  • src/crypto/x11/luffa.cpp
  • src/crypto/x11/sph_groestl.h
  • src/crypto/x11/sph_blake.h
  • src/crypto/x11/sph_echo.h
  • src/crypto/x11/bmw.cpp
  • src/crypto/x11/sph_jh.h
  • src/crypto/x11/simd.cpp
  • src/crypto/x11/shavite.cpp
  • src/crypto/x11/keccak.cpp
  • src/crypto/x11/groestl.cpp
  • src/crypto/x11/sph_types.h
  • src/crypto/x11/blake.cpp
  • src/crypto/x11/echo.cpp
src/crypto/{ctaes,x11}/**

📄 CodeRabbit Inference Engine (CLAUDE.md)

Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11

Files:

  • src/crypto/x11/sph_cubehash.h
  • src/crypto/x11/sph_simd.h
  • src/crypto/x11/sph_luffa.h
  • src/crypto/x11/sph_bmw.h
  • src/crypto/x11/sph_keccak.h
  • src/crypto/x11/skein.cpp
  • src/crypto/x11/cubehash.cpp
  • src/crypto/x11/sph_shavite.h
  • src/crypto/x11/sph_skein.h
  • src/crypto/x11/jh.cpp
  • src/crypto/x11/aes_helper.h
  • src/crypto/x11/luffa.cpp
  • src/crypto/x11/sph_groestl.h
  • src/crypto/x11/sph_blake.h
  • src/crypto/x11/sph_echo.h
  • src/crypto/x11/bmw.cpp
  • src/crypto/x11/sph_jh.h
  • src/crypto/x11/simd.cpp
  • src/crypto/x11/shavite.cpp
  • src/crypto/x11/keccak.cpp
  • src/crypto/x11/groestl.cpp
  • src/crypto/x11/sph_types.h
  • src/crypto/x11/blake.cpp
  • src/crypto/x11/echo.cpp
src/bench/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit Inference Engine (CLAUDE.md)

Performance benchmarks should be placed in src/bench/ and use nanobench

Files:

  • src/bench/pow_hash.cpp
🧠 Learnings (2)
📚 Learning: 2025-07-20T18:42:49.794Z
Learnt from: CR
PR: dashpay/dash#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/bench/**/*.{cpp,h,cc,cxx,hpp} : Performance benchmarks should be placed in src/bench/ and use nanobench

Applied to files:

  • src/Makefile.bench.include
📚 Learning: 2025-07-23T09:30:34.631Z
Learnt from: kwvg
PR: dashpay/dash#6761
File: src/chainlock/signing.h:5-6
Timestamp: 2025-07-23T09:30:34.631Z
Learning: Dash Core uses BITCOIN_ prefix for header guards as the standard convention, inherited from Bitcoin Core. Only a few BLS-specific files in src/bls/ use DASH_ prefix. The vast majority of files (385+) use BITCOIN_ prefix.

Applied to files:

  • src/Makefile.am
🧬 Code Graph Analysis (13)
src/crypto/x11/sph_cubehash.h (1)
src/crypto/x11/cubehash.cpp (6)
  • sph_cubehash512_init (561-565)
  • sph_cubehash512_init (562-562)
  • sph_cubehash512 (568-572)
  • sph_cubehash512 (569-569)
  • sph_cubehash512_close (575-579)
  • sph_cubehash512_close (576-576)
src/crypto/x11/sph_simd.h (1)
src/crypto/x11/simd.cpp (6)
  • sph_simd512_init (1089-1093)
  • sph_simd512_init (1090-1090)
  • sph_simd512 (1095-1099)
  • sph_simd512 (1096-1096)
  • sph_simd512_close (1101-1105)
  • sph_simd512_close (1102-1102)
src/crypto/x11/sph_luffa.h (1)
src/crypto/x11/luffa.cpp (6)
  • sph_luffa512_init (688-696)
  • sph_luffa512_init (689-689)
  • sph_luffa512 (699-703)
  • sph_luffa512 (700-700)
  • sph_luffa512_close (706-710)
  • sph_luffa512_close (707-707)
src/crypto/x11/sph_bmw.h (1)
src/crypto/x11/bmw.cpp (6)
  • sph_bmw512_init (489-493)
  • sph_bmw512_init (490-490)
  • sph_bmw512 (496-500)
  • sph_bmw512 (497-497)
  • sph_bmw512_close (503-507)
  • sph_bmw512_close (504-504)
src/crypto/x11/sph_keccak.h (1)
src/crypto/x11/keccak.cpp (5)
  • sph_keccak512_init (1016-1019)
  • sph_keccak512 (1022-1026)
  • sph_keccak512 (1023-1023)
  • sph_keccak512_close (1029-1033)
  • sph_keccak512_close (1030-1030)
src/bench/pow_hash.cpp (2)
src/hash_x11.h (1)
  • HashX11 (26-90)
src/crypto/x11/keccak.cpp (1)
  • sph_keccak512_init (1016-1019)
src/crypto/x11/sph_shavite.h (1)
src/crypto/x11/shavite.cpp (6)
  • sph_shavite512_init (932-936)
  • sph_shavite512_init (933-933)
  • sph_shavite512 (939-943)
  • sph_shavite512 (940-940)
  • sph_shavite512_close (946-951)
  • sph_shavite512_close (947-947)
src/crypto/x11/sph_skein.h (1)
src/crypto/x11/skein.cpp (6)
  • sph_skein512_init (673-677)
  • sph_skein512_init (674-674)
  • sph_skein512 (680-684)
  • sph_skein512 (681-681)
  • sph_skein512_close (687-691)
  • sph_skein512_close (688-688)
src/crypto/x11/sph_groestl.h (1)
src/crypto/x11/groestl.cpp (6)
  • sph_groestl512_init (2462-2466)
  • sph_groestl512_init (2463-2463)
  • sph_groestl512 (2469-2473)
  • sph_groestl512 (2470-2470)
  • sph_groestl512_close (2476-2480)
  • sph_groestl512_close (2477-2477)
src/crypto/x11/sph_blake.h (1)
src/crypto/x11/blake.cpp (6)
  • sph_blake512_init (609-613)
  • sph_blake512_init (610-610)
  • sph_blake512 (616-620)
  • sph_blake512 (617-617)
  • sph_blake512_close (623-627)
  • sph_blake512_close (624-624)
src/crypto/x11/sph_echo.h (1)
src/crypto/x11/echo.cpp (6)
  • sph_echo512_init (406-410)
  • sph_echo512_init (407-407)
  • sph_echo512 (413-417)
  • sph_echo512 (414-414)
  • sph_echo512_close (420-424)
  • sph_echo512_close (421-421)
src/crypto/x11/sph_jh.h (1)
src/crypto/x11/jh.cpp (6)
  • sph_jh512_init (467-471)
  • sph_jh512_init (468-468)
  • sph_jh512 (474-478)
  • sph_jh512 (475-475)
  • sph_jh512_close (481-485)
  • sph_jh512_close (482-482)
src/crypto/x11/sph_types.h (2)
src/chainlock/signing.h (1)
  • int32_t (28-87)
src/crypto/common.h (12)
  • WriteBE16 (73-77)
  • ReadBE16 (52-57)
  • WriteLE16 (34-38)
  • ReadLE16 (13-18)
  • WriteBE32 (79-83)
  • ReadBE32 (59-64)
  • WriteLE32 (40-44)
  • ReadLE32 (20-25)
  • WriteBE64 (85-89)
  • ReadBE64 (66-71)
  • WriteLE64 (46-50)
  • ReadLE64 (27-32)
🔇 Additional comments (95)
src/crypto/common.h (1)

73-78: Add BE16 writer: consistent, correct, and matches existing pattern

Implementation mirrors the LE/BE 32/64 helpers (htobe16_internal + memcpy), which is the established, alias-safe pattern in this header. Good addition.

src/Makefile.bench.include (1)

46-46: Confirmed: bench/bench_dash links SPH via LIBBITCOIN_CRYPTO — no change required

Short summary: pow_hash.cpp is built into bench/bench_dash and bench/bench_dash is linked against $(LIBBITCOIN_CRYPTO). src/Makefile.am adds crypto/libbitcoin_crypto_sph.la to LIBBITCOIN_CRYPTO and the sph lib includes SPH sources/flags, so SPH symbols are transitively available to the bench binary.

Relevant locations:

  • src/Makefile.bench.include
    • bench_bench_dash_SOURCES includes bench/pow_hash.cpp
    • bench_bench_dash_LDADD includes $(LIBBITCOIN_CRYPTO)
  • src/Makefile.am
    • LIBBITCOIN_CRYPTO_SPH = crypto/libbitcoin_crypto_sph.la
    • LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SPH)
    • crypto_libbitcoin_crypto_sph_la_CXXFLAGS includes $(SPHLIB_CXXFLAGS) and the sph sources/headers are listed

Result: no action required; SPH objects are already linked into the bench target.

configure.ac (1)

1880-1881: AC_SUBST(SPHLIB_CXXFLAGS) export: good plumbing

This ensures the per-SPH flags propagate to generated Makefiles. Combined with the guarded -O3, this gives fine-grained control without impacting global CXXFLAGS.

src/crypto/x11/sph_cubehash.h (2)

97-114: close/addbits_and_close prototypes updated consistently

The close and addbits_and_close prototypes now use sph_cubehash512_context*, consistent with the rest of the API and the implementation. Reinitialization semantics via the close path match the implementation.


76-86: Typed context migration for CubeHash-512 API looks good

I searched the repo — header, definitions, and callers have been updated to use sph_cubehash512_context*; no legacy prototypes using void *cc remain.

Files of interest:

  • src/crypto/x11/sph_cubehash.h — typedef/prototypes updated (lines: 67, 76, 86, 97, 112-113).
  • src/crypto/x11/cubehash.cpp — function definitions use sph_cubehash512_context *cc (lines: 562, 569, 576, 583).
  • Call sites:
    • src/hash_x11.h — ctx declaration and calls (lines: 36, 73–75).
    • src/bench/pow_hash.cpp — ctx and calls (lines: 65, 69–71).

No further changes required.

src/crypto/x11/sph_shavite.h (1)

78-89: Typed-context migration for SHAvite-512 is consistent — no action required

Header, implementation and callers all use sph_shavite512_context*; verified locations:

  • src/crypto/x11/sph_shavite.h — declarations use sph_shavite512_context*
  • src/crypto/x11/shavite.cpp — implementations match the header
  • src/hash_x11.h and src/bench/pow_hash.cpp — callers updated to pass &ctx_shavite

No fixes needed.

src/crypto/x11/sph_bmw.h (1)

76-87: LGTM — BMW-512 uses typed context pointer; no action required

Verified that the header, implementation, and callers are consistent: the API uses sph_bmw512_context* and all uses match.

  • src/crypto/x11/sph_bmw.h — typedef + prototypes (sph_bmw512_init, sph_bmw512, sph_bmw512_close, sph_bmw512_addbits_and_close)
  • src/crypto/x11/bmw.cpp — matching function definitions (wrap bmw64_* implementations)
  • src/hash_x11.h — caller usage (sph_bmw512_init(&ctx_bmw), sph_bmw512(...), sph_bmw512_close(...))
  • src/bench/pow_hash.cpp — bench usage (sph_bmw512_context ctx; sph_bmw512_init(&ctx); ...)
src/crypto/x11/sph_simd.h (1)

76-87: SIMD-512 typed context parameterization looks correct — no outliers found

Declaration and implementation use sph_simd512_context* and the callsites pass a pointer (e.g., &ctx_simd, &ctx, cc).

Files/locations checked:

  • src/crypto/x11/sph_simd.h:76 — declaration: void sph_simd512_init(sph_simd512_context *cc);
  • src/crypto/x11/simd.cpp:1090 — implementation signature
  • src/crypto/x11/simd.cpp:1111 — call: sph_simd512_init(cc);
  • src/hash_x11.h:81 — call: sph_simd512_init(&ctx_simd);
  • src/bench/pow_hash.cpp:153 — call: sph_simd512_init(&ctx);
src/crypto/x11/sph_skein.h (1)

80-90: Approve — Skein-512 typed context adoption is consistent and callers pass addresses

Verified with the provided grep output: declaration and implementation use sph_skein512_context* and init callers pass the address of context variables, so no changes needed.

  • src/crypto/x11/sph_skein.h:80 — declaration (init) and update prototype present
  • src/crypto/x11/skein.cpp:674 — init signature; 698 — internal call
  • src/hash_x11.h:57 — sph_skein512_init(&ctx_skein);
  • src/bench/pow_hash.cpp:165 — sph_skein512_init(&ctx);
src/crypto/x11/sph_luffa.h (3)

62-62: Typed context API for Luffa-512 looks correct and consistent.

Signatures now use sph_luffa512_context* and align with the corresponding implementation in luffa.cpp. The change removes void* ambiguity and is a net improvement.

Also applies to: 72-72, 83-83, 99-99


74-84: Double-check doc promise: “context is automatically reinitialized.”

Per luffa.cpp, sph_luffa512_close delegates to addbits_and_close, and addbits_and_close reinitializes the context. This matches the header’s contract. Please ensure all other 512-bit alg headers maintain the same documented behavior for consistency.

Also applies to: 86-99


39-41: C linkage removed — verify no C translation units include this header.

With extern "C" removed, these symbols will have C++ linkage. Confirm that all inclusion sites compile as C++ (which is the case across this PR).

src/crypto/x11/sph_blake.h (3)

78-78: Typed context API for BLAKE-512 is consistent with blake.cpp.

The updated prototypes match the implementation (blake.cpp lines 608–626), and removing void* improves type safety.

Also applies to: 88-88, 99-99, 115-115


46-47: SPH_64 gating removed — confirm only 64-bit targets build these headers.

Given PR scope (assume true 64-bit target), defining SPH_SIZE_blake512 unconditionally is fine. Just ensure all build configs set/assume 64-bit, since BLAKE-512 X11 path depends on it.


40-41: C linkage removed — verify all inclusions compile as C++.

As with other sph_* headers in this PR, ensure there are no remaining C-object files depending on C linkage for these names.

src/bench/pow_hash.cpp (1)

31-37: No action needed — bench target already links test/util (LIBTEST_UTIL)

Verified: src/Makefile.bench.include wires $(LIBTEST_UTIL) into bench_bench_dash_LDADD, and test/util sources are present, so instantiating TestingSetup in bench files is already supported.

Files to note:

  • src/Makefile.bench.include — bench_bench_dash_LDADD includes $(LIBTEST_UTIL) (around lines 61–66).
  • src/Makefile.test_util.include — contains test/util sources/headers (around lines 21–23).
  • src/bench/pow_hash.cpp — calls MakeNoLogFileContext() (lines 31–33).
src/crypto/x11/skein.cpp (1)

674-699: Typed context migration for Skein-512 is correct; reinit on close is preserved.

The new signatures match the header, and addbits_and_close reinitializes via sph_skein512_init(cc), preserving prior semantics.

src/crypto/x11/groestl.cpp (1)

2463-2487: Typed context 512-bit API looks good and matches the new headers.

The 64-bit-only pathway and sph_groestl512_* functions align with the PR’s 64-bit assumption and header changes.

src/crypto/x11/shavite.cpp (5)

933-936: API migration to typed context pointer looks correct (init)

sph_shavite512_init now strongly types the context. This improves type safety and matches the broader refactor. No issues spotted.


940-943: Typed update signature is consistent and correct

sph_shavite512 takes a typed context and forwards to shavite_big_core as expected. Looks good.


947-951: Close + reinit flow preserved with typed context

sph_shavite512_close forwards to shavite_big_close and reinitializes with IV512. Behavior matches typical SPH semantics. Good change.


955-959: Addbits-and-close API matches typed migration and resets state

Signature and flow are consistent with the refactor. Good to go.


60-60: aes_helper.h defines AES_ROUND_NOKEY_LE and is header-only-safe

Verified: AES_ROUND_NOKEY_LE is present as an inline function and there are no obvious non-inline function definitions in the header, so including it from multiple TUs should not cause ODR/link errors.

  • File: src/crypto/x11/aes_helper.h — inline definition at lines 148–151:
    inline void AES_ROUND_NOKEY_LE(const sph_u32 x0, const sph_u32 x1, const sph_u32 x2, const sph_u32 x3,
    sph_u32& y0, sph_u32& y1, sph_u32& y2, sph_u32& y3)
    { return AES_ROUND_LE(x0, x1, x2, x3, /k0=/0, /k1=/0, /k2=/0, /k3=/0, y0, y1, y2, y3); }
src/crypto/x11/sph_echo.h (4)

79-79: Strongly-typed echo512 context in the public API is a solid improvement

The move from void* to sph_echo512_context* improves compile-time safety for callers. No concerns here.


89-89: Update API uses typed context for processing — consistent with implementation

The data processing function signature aligns with echo.cpp usage. Looks correct.


100-100: Close API now typed — consistent with bench and impl call sites

Signature is consistent with the implementation shown in echo.cpp. Good.


115-116: Addbits-and-close API typed; C++ linkage is safe — no in-repo C consumers found

Signature is correct. I searched the repo: there are no extern "C" wrappers under src/crypto/x11 and no C (.c) translation units include sph_*.h, so the SPH headers having C++ linkage should not break in-repo C consumers.

  • File under review

    • src/crypto/x11/sph_echo.h (lines 115–116):
      void sph_echo512_addbits_and_close(
      sph_echo512_context *cc, unsigned ub, unsigned n, void *dst);
  • Verification summary

    • repo-wide search found extern "C" occurrences in other areas (configure.ac, src/stacktraces.cpp, third-party headers), but none in src/crypto/x11.
    • No .c files include sph_.h (no matches for #include "sph_.h" in .c files).

Conclusion: no fix required here; resolving this review comment.

src/crypto/x11/bmw.cpp (5)

483-483: Use of static_cast for out pointer is appropriate in C++

Casting void* to unsigned char* with static_cast keeps the implementation idiomatic for C++. No issues.


490-493: bmw512_init: typed context pointer and IV initialization look correct

Forwarding to bmw64_init(cc, IV512) with a typed context aligns with the refactor.


497-500: bmw512 update: signature migration is correct

bmw64(cc, data, len) is unmodified; using sph_bmw512_context* is consistent. LGTM.


504-507: bmw512_close forwards to addbits-and-close — clean and reduces duplication

Pattern is consistent across other algorithms. Looks good.


511-515: bmw512_addbits-and-close: typed signature and reinit behavior are correct

Closes, outputs, then re-initializes state — expected SPH behavior. No issues.

src/crypto/x11/cubehash.cpp (5)

555-555: C++-style cast for out pointer is correct

Switching to static_cast<unsigned char*> is the right choice here. No functional change otherwise.


562-565: cubehash512_init: typed context pointer migration looks good

Initialization path unchanged and correct under the new signature.


569-572: cubehash512 update: signature migration consistent

cubehash_core call is unchanged; typed context improves safety. No issues.


576-579: cubehash512_close: delegates to addbits-and-close — consistent pattern

Refactor reduces duplication. LGTM.


583-587: cubehash512_addbits-and-close: typed context and reinit pattern are correct

The function resets state after finalization as expected. Looks good.

src/crypto/x11/sph_jh.h (4)

78-78: Typed JH-512 init API improves type safety

The updated signature aligns with jh.cpp implementation. No issues.


88-88: Typed JH-512 update signature is consistent with implementation

Forwarding to jh_core with a typed context, as in jh.cpp, looks correct.


99-99: Typed JH-512 close signature is correct and aligns with jh.cpp

Matches jh_close(cc, 0, 0, dst, 16, IV512) pattern noted in the implementation.


115-116: sph_jh512: ABI & 64-bit-layout — verification

Short summary: I checked the repo. Prototypes are typed (sph_jh512_context*), no void*-based sph_jh512 prototypes remain, and all in-repo callsites are C++ — but the header has no extern "C" guard and the implementation operates on H.wide (sph_u64), so the context layout assumes 64-bit. If you have external C consumers or need 32-bit compatibility, this is a potential ABI break.

Findings / locations:

  • Declaration / context:
    • src/crypto/x11/sph_jh.h — typed prototypes (sph_jh512_init, sph_jh512, sph_jh512_close, sph_jh512_addbits_and_close) and sph_jh512_context typedef (uses union { sph_u64 wide[16]; sph_u32 narrow[32]; } and sph_u64 block_count).
  • Definitions:
    • src/crypto/x11/jh.cpp — definitions for sph_jh512_* (uses H.wide in READ_STATE/WRITE_STATE and jh_close/jh_init).
  • Call sites (all in-repo, C++):
    • src/hash_x11.h — sph_jh512_init(&ctx_jh), sph_jh512(...), sph_jh512_close(&ctx_jh, static_cast<void*>(&hash[4])).
    • src/bench/pow_hash.cpp — sph_jh512_init(&ctx); sph_jh512(...); sph_jh512_close(&ctx, &hash).
  • Search: no matches for void-based sph_jh512_* prototypes were found in the repository.

What to do (brief):

  • If there are any external C consumers, re‑introduce extern "C" around the declarations or provide a C wrapper to preserve the C ABI.
  • If 32-bit builds must be supported, verify/restore a 32-bit-compatible context layout or add compatibility shims/document that this API now requires 64-bit contexts.

Tag:

src/crypto/x11/keccak.cpp (5)

1016-1019: Typed Keccak-512 init: API migration looks correct.

Switching to sph_keccak512_context* aligns with the new header and the 64-bit-only path.


1023-1026: Typed Keccak-512 update: OK.

Directly forwarding to keccak_core with lim=72 is consistent with Keccak-512 rate.


1030-1033: Typed Keccak-512 close: OK.

Delegation to addbits-and-close preserves behavior.


1037-1040: Typed Keccak-512 addbits-and-close: OK.

Matches the DEFCLOSE(64,72) implementation.


1016-1040: No C translation units call the 512 APIs — removing extern "C" wrappers is safe

Search of .c translation units found no includes of sph_keccak|groestl|echo|luffa headers and no calls to sph_*512 symbols. Only occurrence is a comment in src/crypto/x11/aes_helper.h referencing aes_helper.c (harmless).

  • Verified: no .c files reference sph_keccak/groestl/echo/luffa 512 APIs.
  • Location of C++ definitions: src/crypto/x11/keccak.cpp (sph_keccak512, sph_keccak512_close, sph_keccak512_addbits_and_close).
  • Action: no repo changes required. If external C consumers exist outside this repository, update them or restore a C-compatible header/extern "C" wrapper as needed.
src/crypto/x11/sph_groestl.h (5)

78-79: Typed Groestl-512 init prototype: OK.

Matches the updated implementation in groestl.cpp.


88-89: Typed Groestl-512 update prototype: OK.

Signature and constness are appropriate.


99-100: Typed Groestl-512 close prototype: OK.


114-116: Typed Groestl-512 addbits-and-close prototype: OK.

Consistent with the implementation.


78-116: Verify header/API changes don’t break ABI/usage — no issues found

Quick summary:

  • No C translation units include src/crypto/x11/sph_groestl.h — the header is only included from C++ sources (src/crypto/x11/groestl.cpp, src/hash_x11.h, src/bench/pow_hash.cpp).
  • The groestl context now uses a single 64-bit field (sph_u64 count) in src/crypto/x11/sph_groestl.h, and src/crypto/x11/groestl.cpp consistently uses sc->count / SPH_T64; there are no references to count_low/count_high in the Groestl implementation.
  • count_low/count_high remain in the unrelated SIMD code (src/crypto/x11/sph_simd.h and src/crypto/x11/simd.cpp).

Conclusion: removing C linkage and switching to a single 64-bit count for Groestl does not break usage in this repository — no changes required.

src/crypto/x11/echo.cpp (6)

416-417: C++-safe cast for data pointer: LGTM.

static_cast to const unsigned char* avoids C-style casts and is correct for buffer arithmetic.


407-410: Typed Echo-512 init: OK.

Aligned with sph_echo.h.


414-417: Typed Echo-512 update: OK.

Correct forwarding to echo_big_core.


421-424: Typed Echo-512 close: OK.


428-431: Typed Echo-512 addbits-and-close: OK.


52-52: Switch to header-only AES helper: OK.

aes_helper.h defines functions as constexpr inline / inline and there's no aes_helper.c present, so including the .h from multiple TUs should not cause multiple-definition issues.

  • src/crypto/x11/aes_helper.h — inline defs at lines 57 (pack_le), 65 (gf8_mul2), 87 (gf8_mul), 137 (AES_ROUND_LE), 148 (AES_ROUND_NOKEY_LE)
  • No src/crypto/x11/aes_helper.c found
  • Header included at src/crypto/x11/echo.cpp:52 and src/crypto/x11/shavite.cpp:60
src/crypto/x11/sph_keccak.h (5)

77-78: Typed Keccak-512 init prototype: OK.


87-88: Typed Keccak-512 update prototype: OK.


98-99: Typed Keccak-512 close prototype: OK.


113-115: Typed Keccak-512 addbits-and-close prototype: OK.


77-115: Confirm C linkage not required — manual verification needed

I ran a search for .c files including "sph_keccak.h" or referencing sph_keccak512 symbols; no matches were found in .c files. Absence of matches in the repo isn't definitive — please confirm no C code or external C consumers depend on these headers/symbols.

Files to check:

  • src/crypto/x11/sph_keccak.h (removed extern "C" guards)

If any C consumers exist, reintroduce extern "C" around the public declarations in that header.

src/crypto/x11/luffa.cpp (8)

73-103: Guarding non-parallel RC tables: OK.

Conditionally excluding RC00/RC14 paths is consistent with forcing the parallel variant.


123-154: Guarding additional non-parallel RC tables: OK.

RC20/RC34 are now compiled out under the forced parallel mode.


652-652: C++-safe cast in luffa5_close: OK.

static_cast<unsigned char*> is the right fix in C++.


689-696: Typed Luffa-512 init: OK.

Initialization logic unchanged, just typed context.


700-703: Typed Luffa-512 update: OK.


707-710: Typed Luffa-512 close: OK.


714-718: Typed Luffa-512 addbits-and-close: OK.

Re-init after close preserves reusability semantics.


38-38: Force-enable SPH_LUFFA_PARALLEL — confirm intent

The file hard-defines the parallel path and still contains excluded scalar branches, so the non-parallel implementation will be removed at compile time.

  • src/crypto/x11/luffa.cpp
    • Line ~38: #define SPH_LUFFA_PARALLEL 1
    • Lines 73 and 123: #if !SPH_LUFFA_PARALLEL (scalar code paths that will be excluded)

Please verify that no other files or build configurations expect the scalar path and that all target platforms/benchmarks support the parallel-only implementation.

src/crypto/x11/jh.cpp (4)

392-397: LGTM! Clean initialization implementation.

The function correctly initializes the JH context with the provided IV, resets the pointer and block counter.


399-436: LGTM! Well-structured core processing function.

The buffering logic correctly handles partial blocks and processes complete blocks with the compression function. The block counter increment is properly placed after processing.


438-464: LGTM! Correct finalization with proper padding.

The close function correctly handles padding, appends the bit length in big-endian format, and produces the final hash output from the upper half of the state as per JH-512 specification.


467-492: LGTM! Clean public API implementation.

The public API functions properly wrap the internal functions with typed context pointers, following the consistent pattern across the SPH library.

src/crypto/x11/simd.cpp (4)

1029-1034: LGTM! Consistent API update for typed contexts.

The function signature correctly uses typed sph_simd_big_context* instead of void*, and properly updates all field references from sc-> to cc->.


1037-1056: LGTM! Consistent context usage throughout update function.

All references to context fields have been properly updated to use cc->.


1070-1087: LGTM! Proper type safety with static_cast.

The finalize function correctly uses static_cast<unsigned char*> for the destination pointer, improving type safety.


1089-1112: LGTM! Clean public API with typed contexts.

The public 512-bit SIMD functions now use typed sph_simd512_context* instead of void*, following the consistent pattern across the SPH library and improving type safety.

src/crypto/x11/blake.cpp (3)

42-44: LGTM! Simplified 64-bit configuration.

The conditional compilation for SPH_COMPACT_BLAKE_64 has been simplified to only depend on SPH_SMALL_FOOTPRINT_BLAKE, removing the complex SPH_64 dependency check as the code now assumes 64-bit targets.


603-603: LGTM! Improved type safety with static_cast.

The use of static_cast<unsigned char*> for the destination pointer improves type safety.


609-635: LGTM! Consistent typed context API.

The public Blake-512 functions now use typed sph_blake512_context* instead of void*, following the consistent pattern across the SPH library and improving type safety.

src/Makefile.am (3)

76-77: LGTM! Proper integration of SPH crypto library.

The SPH crypto library is correctly added as a separate static library component.


723-756: LGTM! Well-structured SPH library configuration.

The SPH crypto library build configuration is properly set up with:

  • Static library configuration consistent with other crypto libraries
  • Optional O3 optimization via SPHLIB_CXXFLAGS
  • Small footprint flags for CubeHash and JH to optimize binary size
  • Complete list of SPH sources for X11 hashing

1022-1022: LGTM! Correct inclusion of SPH sources in consensus library.

The SPH crypto sources are properly included in the dash consensus library to ensure X11 hashing is available for consensus validation.

src/crypto/x11/aes_helper.h (5)

55-93: LGTM! Clean constexpr implementation of GF(2^8) arithmetic.

The GF(2^8) arithmetic utilities are well-implemented using modern C++ constexpr features, with proper compile-time table generation for multiplication operations.


96-110: LGTM! Correct AES S-box generation.

The constexpr S-box generation correctly implements the AES S-box using the affine transformation over GF(2^8).


117-134: LGTM! Efficient T-table implementation with validation.

The AES T-tables are correctly generated for optimized round computations, and the static_assert provides compile-time validation of the table values.


137-152: LGTM! Clean inline AES round helpers.

The AES round helper functions are properly implemented as inline functions for efficient code generation. The implementation correctly uses the precomputed T-tables for fast AES round evaluation.


46-47: Header guard OK — consistent with other SPH headers

The header guard in src/crypto/x11/aes_helper.h is SPH_AES_HELPER_H__, matching the SPH_H_ pattern used across other SPH headers. No change required.

  • Confirmed: src/crypto/x11/aes_helper.h — SPH_AES_HELPER_H__ — OK
  • Files checked (examples): src/crypto/x11/sph_types.h, sph_skein.h, sph_simd.h, sph_shavite.h, sph_luffa.h, sph_keccak.h, sph_jh.h
src/crypto/x11/sph_types.h (6)

59-62: Fixed-width type aliases look good

Switching to explicit fixed-width aliases reduces ambiguity and matches the project’s C++20 guidelines.


70-73: Good move to std::rotl/std::rotr

Using std::rotl/rotr guarantees well-defined rotation semantics across compilers and architectures.


75-75: Confirm ALWAYS_INLINE semantics across compilers

SPH_INLINE now aliases ALWAYS_INLINE. Ensure ALWAYS_INLINE expands to include “inline” (or equivalent) on all supported toolchains to preserve inlining of these tiny helpers. If ALWAYS_INLINE already contains inline/__forceinline attributes, ignore.


295-305: Byte-swap consolidation is correct

Delegating to internal_bswap_32/64 centralizes platform specifics and should be optimal with modern compilers.

Also applies to: 313-317


64-69: SPH_C32/64 braced-init and SPH_T32/T64 — verification complete, no action required

Short summary:

  • I inspected src/crypto/x11. SPH_C32/64 are defined as list-initialization (sph_u32{x}/sph_u64{x}) and SPH_T32/T64 are identity macros.
  • I searched usages across the x11 code: SPH_C32/64 are used almost exclusively with integer literals, and SPH_T32/T64 are applied to values already of the correct width or to expressions explicitly cast to sph_u32/sph_u64.

Key locations checked:

  • Definition: src/crypto/x11/sph_types.h (SPH_C32/64 and SPH_T32/T64).
  • Literal-heavy uses (no narrowing risk): src/crypto/x11/luffa.cpp, keccak.cpp, blake.cpp, skein.cpp, bmw.cpp, groestl.cpp, jh.cpp (many SPH_C32/SPH_C64 calls with hex literals).
  • SPH_T32/T64 applied to correctly-typed values or explicit casts: e.g. luffa.cpp (lines ~244–247: SPH_T32((sph_u32)(u)) / (u>>32)), shavite.cpp (~872–878: SPH_T32 on sc->count*), skein.cpp (~279–307: SPH_T64 on 64-bit additions).

Conclusion:

  • No call sites requiring implicit truncation were found, and no non-literal/wider arguments to SPH_C32/64 were detected in the x11 sources scanned. No behavioral change or narrowing error is expected from the current definitions.

55-57: C++20 header usage — verified (no C / pre-C++20 TUs include sph_types.h)

Brief: I searched the repo for direct includers and build settings. sph_types.h is only included by C++ headers in crypto/x11 and the build is configured for C++20.

Files referencing sph_types.h (direct includes):

  • src/crypto/x11/sph_echo.h
  • src/crypto/x11/sph_skein.h
  • src/crypto/x11/sph_shavite.h
  • src/crypto/x11/sph_luffa.h
  • src/crypto/x11/sph_jh.h
  • src/crypto/x11/sph_keccak.h
  • src/crypto/x11/sph_blake.h
  • src/crypto/x11/sph_simd.h
  • src/crypto/x11/sph_cubehash.h
  • src/crypto/x11/sph_bmw.h
  • src/crypto/x11/aes_helper.h
  • src/crypto/x11/sph_groestl.h
    (also src/crypto/x11/sph_types.h itself)

Build-system evidence C++20 is enabled:

  • CMakeLists.txt:11 — set(CMAKE_CXX_STANDARD 20)
  • depends/Makefile:52 — CXX_STANDARD ?= c++20

Conclusion: no C or pre-C++20 translation units directly include this header and the project defaults to C++20. No change required.

Comment on lines 381 to 383
dnl If not debugging, enable more aggressive optimizations for sphlib sources
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Respect user overrides when adding -O3 to SPHLIB_CXXFLAGS

Currently, -O3 is appended whenever available, even if SPHLIB_CXXFLAGS is pre-set by the user. Guard the optimization so users can override it by setting SPHLIB_CXXFLAGS in the environment.

Apply:

-  dnl If not debugging, enable more aggressive optimizations for sphlib sources
-  AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
+  dnl If not debugging, enable more aggressive optimizations for sphlib sources
+  dnl Respect user overrides: only add -O3 if SPHLIB_CXXFLAGS is unset
+  if test -z "$SPHLIB_CXXFLAGS"; then
+    AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
+  fi

Optional: expose SPHLIB_CXXFLAGS in help and make intent clearer:

  • Add near other ARG_VARs:
    AC_ARG_VAR([SPHLIB_CXXFLAGS], [Extra C++ flags applied only to SPH sources (default: try -O3 if supported when not debugging)])
  • Print SPHLIB_CXXFLAGS in the summary to ease diagnostics.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dnl If not debugging, enable more aggressive optimizations for sphlib sources
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
dnl If not debugging, enable more aggressive optimizations for sphlib sources
dnl Respect user overrides: only add -O3 if SPHLIB_CXXFLAGS is unset
if test -z "$SPHLIB_CXXFLAGS"; then
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CXXFLAGS="$SPHLIB_CXXFLAGS -O3"], [], [$CXXFLAG_WERROR])
fi

Comment on lines 327 to 443
sph_enc16be(void *dst, unsigned val)
{
((unsigned char *)dst)[0] = (val >> 8);
((unsigned char *)dst)[1] = val;
WriteBE16(static_cast<unsigned char*>(dst), val);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

16-bit encode/decode: prefer uint16_t for clarity and to avoid unintended narrowing

These APIs take/return “unsigned” but call Read/WriteBE/LE16 which operate on uint16_t. While implicit narrowing to 16 bits is likely intentional, the signatures could be clearer and safer if they used uint16_t consistently. At minimum, add a short note that values are truncated to 16 bits.

Also, please double-check the return type of ReadLE16/ReadBE16 in src/crypto/common.h. The provided snippet for ReadLE16 shows a void return, which would be a compile error. Validate both are declared to return uint16_t:

Also applies to: 339-343, 353-356, 365-368


🏁 Script executed:

#!/usr/bin/env bash
set -euo pipefail
rg -nA3 'Read(BE|LE)16\s*\(' src/crypto/common.h

Length of output: 297


🏁 Script executed:

#!/usr/bin/env bash
set -euo pipefail
echo "---- sph_types.h (around reported lines) ----"
sed -n '320,380p' src/crypto/x11/sph_types.h || true

echo
echo "---- grep for 16-bit sph helpers in repo ----"
rg -n --hidden --no-ignore 'sph_enc16|sph_dec16' src || true

echo
echo "---- common.h ReadLE16/ReadBE16 declarations ----"
sed -n '1,120p' src/crypto/common.h || true

Length of output: 4337


Prefer uint16_t for the 16-bit sph enc/dec helpers (avoid implicit truncation)

Verified: ReadLE16/ReadBE16 in src/crypto/common.h return uint16_t (and WriteLE16/WriteBE16 take uint16_t), so the sph helpers using plain unsigned cause implicit narrowing. Change the API to use uint16_t or at minimum document truncation.

Files/locations to update:

  • src/crypto/x11/sph_types.h
    • sph_enc16be — lines ~327-331: change parameter type to uint16_t
    • sph_dec16be — lines ~339-343: change return type to uint16_t
    • sph_enc16le — lines ~353-356: change parameter type to uint16_t
    • sph_dec16le — lines ~365-368: change return type to uint16_t
  • Call site to verify after change:
    • src/crypto/x11/echo.cpp:396 — sph_enc16le(buf + (sizeof sc->buf) - 18, out_size_w32 << 5); ensure the value fits in 16 bits or perform an explicit cast/truncation.

Suggested replacements (minimal):

  • sph_enc16be:
    static SPH_INLINE void
    sph_enc16be(void dst, uint16_t val)
    {
    WriteBE16(static_cast<unsigned char
    >(dst), val);
    }

  • sph_dec16be:
    static SPH_INLINE uint16_t
    sph_dec16be(const void src)
    {
    return ReadBE16(static_cast<const unsigned char
    >(src));
    }

  • sph_enc16le:
    static SPH_INLINE void
    sph_enc16le(void dst, uint16_t val)
    {
    WriteLE16(static_cast<unsigned char
    >(dst), val);
    }

  • sph_dec16le:
    static SPH_INLINE uint16_t
    sph_dec16le(const void src)
    {
    return ReadLE16(static_cast<const unsigned char
    >(src));
    }

Reason: using uint16_t matches the common.h helpers, makes intent clear, and prevents accidental promotion/narrowing surprises. Please apply the changes and review call sites that pass wider integer expressions.

@UdjinM6
Copy link

UdjinM6 commented Aug 14, 2025

macbook m1pro, --enable-debug --enable-crash-hooks --enable-werror

develop:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              247.60 |        4,038,748.96 |    1.2% |      0.01 | `X11_0032b_single`
|               99.31 |       10,069,225.93 |    1.6% |      0.01 | `X11_0080b_single`
|               64.64 |       15,470,418.64 |    0.8% |      0.01 | `X11_0128b_single`
|               16.65 |       60,077,527.50 |    0.1% |      0.01 | `X11_0512b_single`
|                9.20 |      108,738,453.20 |    2.4% |      0.01 | `X11_1024b_single`
|                1.33 |      751,103,182.80 |    1.4% |      0.01 | `X11_1M`
|                5.14 |      194,458,106.14 |    0.3% |      0.01 | `X11_2048b_single`

this PR:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              454.24 |        2,201,461.69 |    0.0% |      0.01 | `Pow_X11_0032b`
|              236.19 |        4,233,832.86 |    5.4% |      0.01 | :wavy_dash: `Pow_X11_0080b` (Unstable with ~49.0 iters. Increase `minEpochIterations` to e.g. 490)
|              177.41 |        5,636,699.32 |    2.9% |      0.01 | `Pow_X11_0128b`
|               46.88 |       21,329,210.67 |    0.4% |      0.01 | `Pow_X11_0512b`
|               25.31 |       39,504,360.41 |    1.1% |      0.01 | `Pow_X11_1024b`
|                2.45 |      408,712,441.37 |   12.8% |      0.03 | :wavy_dash: `Pow_X11_1M` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
|               12.17 |       82,186,375.17 |   13.8% |      0.01 | :wavy_dash: `Pow_X11_2048b` (Unstable with ~35.5 iters. Increase `minEpochIterations` to e.g. 355)

2-3x slower for me 🤷‍♂️

@kwvg kwvg marked this pull request as draft August 14, 2025 11:02
Alongside availing more aggressive optimization if the compiler supports
it, Cubehash and JH perform better with their small footprint variants
so let's default to those variants. We aren't enabling _all_ small
footprint variants as algorithms like Groestl perform significantly
worse.
@kwvg kwvg marked this pull request as ready for review August 15, 2025 12:48
@kwvg kwvg requested review from UdjinM6 and removed request for UdjinM6 August 15, 2025 12:48
UdjinM6
UdjinM6 previously approved these changes Aug 15, 2025
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              247.67 |        4,037,604.47 |    0.8% |      0.01 | `Pow_X11_0032b`
|               98.00 |       10,204,153.94 |    0.5% |      0.01 | `Pow_X11_0080b`
|               62.80 |       15,924,084.99 |    0.4% |      0.01 | `Pow_X11_0128b`
|               16.76 |       59,674,229.76 |    0.3% |      0.01 | `Pow_X11_0512b`
|                9.11 |      109,801,606.72 |    0.7% |      0.01 | `Pow_X11_1024b`
|                1.34 |      748,829,766.28 |    0.5% |      0.01 | `Pow_X11_1M`
|                5.16 |      193,880,884.27 |    0.6% |      0.01 | `Pow_X11_2048b`

light ACK 30bf19d

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/hash_x11.h (2)

69-71: Const-correctness: pass const void to sph_luffa512 like other stages*

All other stages pass a const pointer; Luffa’s stage should be consistent to avoid accidental non-const aliasing and to match typical SPH prototypes.

Apply this diff:

-    sph_luffa512(&ctx_luffa, static_cast<void*>(&hash[5]), 64);
+    sph_luffa512(&ctx_luffa, static_cast<const void*>(&hash[5]), 64);

42-43: Add a compile-time guard for uint512 size

Given the assumption of true 64-bit targets and 512-bit digests, a compile-time check helps catch packing/ABI regressions across platforms or future refactors.

You can insert this assertion near the hash buffer declaration:

     uint512 hash[11];
+    static_assert(sizeof(uint512) == 64, "uint512 must be 64 bytes");
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30bf19d and 44eebe7.

📒 Files selected for processing (1)
  • src/hash_x11.h (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/hash_x11.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: linux64-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: linux64_multiprocess-build / Build source
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
🔇 Additional comments (1)
src/hash_x11.h (1)

40-40: pblank made const and zero-initialized — good defensive fix

This ensures a valid, immutable pointer is supplied for zero-length inputs and avoids any accidental writes.

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 44eebe7

@PastaPastaPasta
Copy link
Member

PastaPastaPasta commented Aug 18, 2025

sphlib requires platforms to be SPH_64 in order to build variants like Blake512 (source) and Bmw512 (source). While other variants have alternate implementations for non-SPH_64 platforms like Groestl (source), non-SPH_64 platforms cannot build Blake512 or Bmw512.
As X11 (and by extension, Dash) requires both, we can safely assume that Dash Core doesn't support non-SPH_64 targets and can remove fallback code.

This doesn't fully make sense to me, as we shipped 32bit binaries historically. I must be missing something. What exactly does SPH_64 mean?

@kwvg
Copy link
Collaborator Author

kwvg commented Aug 19, 2025

What exactly does SPH_64 mean?

@PastaPastaPasta, SPH_64 does not mean a 64-bit architecture, it means that a 64-bit type is representable on the platform, which holds true after C99 as a matter of spec (see below) with additional checks for pre-C99 platforms.

#if defined __STDC__ && __STDC_VERSION__ >= 199901L
/*
* On C99 implementations, we can use <stdint.h> to get an exact 64-bit
* type, if any, or otherwise use a wider type (which must exist, for
* C99 conformance).
*/
#include <stdint.h>
#ifdef UINT32_MAX
typedef uint32_t sph_u32;
typedef int32_t sph_s32;
#else
typedef uint_fast32_t sph_u32;
typedef int_fast32_t sph_s32;
#endif
#if !SPH_NO_64
#ifdef UINT64_MAX
typedef uint64_t sph_u64;
typedef int64_t sph_s64;
#else
typedef uint_fast64_t sph_u64;
typedef int_fast64_t sph_s64;
#endif
#endif
#define SPH_C32(x) ((sph_u32)(x))
#if !SPH_NO_64
#define SPH_C64(x) ((sph_u64)(x))
#define SPH_64 1
#endif

@PastaPastaPasta PastaPastaPasta merged commit a6f55ee into dashpay:develop Aug 19, 2025
55 of 61 checks passed
PastaPastaPasta added a commit that referenced this pull request Sep 25, 2025
…on for Echo512 and Shavite512, improve benchmark stability

005967b crypto: drop naive AES backends (Kittywhiskers Van Gogh)
d131fcc crypto: implement Shavite512's full `Compress()` routine (Kittywhiskers Van Gogh)
31f93ad crypto: unroll Echo512's `FullStateRound()` (Kittywhiskers Van Gogh)
fa68c70 crypto: implement ARM NEON backend for Echo512's `ShiftAndMix()` (Kittywhiskers Van Gogh)
963215b crypto: implement ARM AES backend for Shavite512's `CompressElement()` (Kittywhiskers Van Gogh)
959c9ee crypto: implement ARM AES backend for Echo512's `FullStateRound()` (Kittywhiskers Van Gogh)
76bd236 crypto: implement naive ARM AES backend for simple rounds (Kittywhiskers Van Gogh)
f2ececc crypto: implement AES-NI backend for Shavite512's `CompressElement()` (Kittywhiskers Van Gogh)
e1bbec4 crypto: avoid extra load/store in Echo512's `ShiftAndMix()` (Kittywhiskers Van Gogh)
250dcce crypto: combine Echo512's Shift and Mix operations (Kittywhiskers Van Gogh)
71d6ef9 crypto: implement SSSE3 backend for Echo512's `ShiftRows()` (Kittywhiskers Van Gogh)
da38871 crypto: implement SSSE3 backend for Echo512's `MixColumns()` (Kittywhiskers Van Gogh)
31a6732 crypto: implement AES-NI backend for Echo512's `FullStateRound()` (Kittywhiskers Van Gogh)
1f2eb21 crypto: implement naive AES-NI backend for simple rounds (Kittywhiskers Van Gogh)
cba39c5 const: use function pointer to allow for switching implementation (Kittywhiskers Van Gogh)
d6d3518 crypto: replace hardcoded AES transform tables with constexpr tables (Kittywhiskers Van Gogh)
20fc998 refactor: move software AES round to header (Kittywhiskers Van Gogh)
386742b refactor: remove large footprint Shavite512 impl, switch to C++ (Kittywhiskers Van Gogh)
7e8607e refactor: remove large footprint Echo512 impl, switch to C++ (Kittywhiskers Van Gogh)
c4e7a40 fix: suppress unaligned memory UBSan warnings if supported by arch (Kittywhiskers Van Gogh)
830928c bench: set minimum epoch iterations to improve `pow_hash` stability (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  ### Important

  * Due to the build system changes done in this PR, switching between commits in this branch or between this branch and any other branch (like `develop`) will likely require `git -dfx src/bench src/crypto src/primitives` and re-running `./autogen.sh` and `./configure`.

  * As the dispatcher doesn't kick in until _after_ genesis block validation, the correctness of the implementation can be validated by applying a commit that would retrigger the validation in the benchmark ([commit](23d15b2)). This is not included in this PR as it negatively impacts benchmark results in some builds ([comment](#6796 (comment))).

  * While working on the ARMv7/ARMv8 backends, it was noticed that the `pow_hash` benchmark's error rate on Apple Silicon can go as high as ~15%, even on repeated runs, which reduces the reliability of benchmark results and hinders decision-making. To mitigate this, `minEpochIterations` has been reintroduced and set to 20 (up from pre-[dash#6752](#6752 10).

    Baseline measurements were taken against this change _before_ large footprint removal.

  ### Misc.

  * `libdashconsensus` is built with hardware optimizations disabled, which is currently set with `DISABLE_OPTIMIZED_SHA256` (introduced in [bitcoin#29180](bitcoin#29180)). To align with upstream behavior, our platform-specific code is disabled in the library as well using the macro to mean "disable optimizations".

  * Despite not being specified in Apple's documentation ([source](https://developer.apple.com/documentation/kernel/1387446-sysctlbyname/determining_instruction_set_characteristics)), some versions of macOS report NEON (a.k.a. advanced SIMD) using `hw.optional.arm.AdvSIMD` instead of `hw.optional.AdvSIMD`, so we check for that `sysctl` as well (see [google/cpufeatures#390](google/cpu_features#390))

  * Unaligned memory accesses are hardware-supported by platforms like x86_64 and sphlib utilizes them to improve performance ([source](https://github.com/dashpay/dash/blob/893b46a000c5088ce92f8625e74d7e3c126e0cdb/src/crypto/x11/sph_types.h#L117-L132)). When switching to the small footprint implementation of Shavite512, this triggers a UBSan error ([build](https://github.com/dashpay/dash/actions/runs/17896732544/job/50884688479#step:10:292)).

    As this is intentional behavior, we have opted to suppress the error by conditionally suppressing alignment sanitization _if_ the target platform supports it and it has been enabled (i.e. `SPH_UPTR` is set, which it is by default).

  ## Benchmarks

  ### Apple M1, macOS Sequoia 15.7

  * **Build Information:** Apple Clang 17 (clang-1700.3.19.1), Xcode 26.0 (17A324)
  * **Depends Config:** `MULTIPROCESS=1 CC=clang CXX=clang++ HOST=aarch64-apple-darwin`
  * **Build Config:** `CC=clang CXX=clang++ CFLAGS="-O2 -g" LDFLAGS="-Wl,-O2" ./configure --prefix=$(pwd)/depends/aarch64-apple-darwin --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccache`

  **Baseline**

  * <details>

    <summary>Echo512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               39.68 |       25,201,342.83 |    0.2% |      0.01 | `Pow_Echo512_0032b`
    |               16.09 |       62,143,122.31 |    1.2% |      0.01 | `Pow_Echo512_0080b`
    |               19.63 |       50,943,039.85 |    0.2% |      0.01 | `Pow_Echo512_0128b`
    |               12.24 |       81,674,975.07 |    0.9% |      0.01 | `Pow_Echo512_0512b`
    |               10.94 |       91,442,178.90 |    0.4% |      0.01 | `Pow_Echo512_1024b`
    |               10.30 |       97,063,958.41 |    0.3% |      0.01 | `Pow_Echo512_2048b`
    |                9.72 |      102,906,236.78 |    0.4% |      0.11 | `Pow_Echo512_1M`
    ```

    </details>

  * <details>

    <summary>Shavite512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               15.18 |       65,869,311.23 |    0.5% |      0.01 | `Pow_Shavite512_0032b`
    |                6.08 |      164,380,032.21 |    0.5% |      0.01 | `Pow_Shavite512_0080b`
    |                7.50 |      133,291,221.55 |    0.4% |      0.01 | `Pow_Shavite512_0128b`
    |                4.69 |      213,338,740.34 |    0.6% |      0.01 | `Pow_Shavite512_0512b`
    |                4.25 |      235,339,069.77 |    0.8% |      0.01 | `Pow_Shavite512_1024b`
    |                4.02 |      248,784,999.42 |    0.7% |      0.01 | `Pow_Shavite512_2048b`
    |                3.75 |      266,841,572.42 |    0.4% |      0.04 | `Pow_Shavite512_1M`
    ```

    </details>

  * <details>

    <summary>X11:</summary>

    ```
    |             ns/byte |              byte/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |              249.10 |        4,014,450.39 |    0.2% |      0.01 | `Pow_X11_0032b`
    |              100.76 |        9,924,687.87 |    0.7% |      0.01 | `Pow_X11_0080b`
    |               63.72 |       15,694,749.39 |    0.4% |      0.01 | `Pow_X11_0128b`
    |               16.99 |       58,861,757.72 |    0.3% |      0.01 | `Pow_X11_0512b`
    |                9.27 |      107,817,565.81 |    0.2% |      0.01 | `Pow_X11_1024b`
    |                5.28 |      189,479,026.11 |    0.8% |      0.01 | `Pow_X11_2048b`
    |                1.37 |      731,462,009.69 |    0.7% |      0.01 | `Pow_X11_1M`
    ```

    </details>

  **Optimized**

  * <details>

    <summary>Echo512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               14.34 |       69,741,448.00 |    0.8% |      0.01 | `Pow_Echo512_0032b`
    |                5.68 |      176,124,721.60 |    0.0% |      0.01 | `Pow_Echo512_0080b`
    |                7.32 |      136,556,992.52 |    1.5% |      0.01 | `Pow_Echo512_0128b`
    |                4.31 |      232,035,714.32 |    0.1% |      0.01 | `Pow_Echo512_0512b`
    |                3.97 |      251,961,418.41 |    0.8% |      0.01 | `Pow_Echo512_1024b`
    |                3.62 |      276,147,242.57 |    0.1% |      0.01 | `Pow_Echo512_2048b`
    |                3.43 |      291,830,314.57 |    0.4% |      0.82 | `Pow_Echo512_1M`
    ```

    </details>

  * <details>

    <summary>Shavite512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               17.16 |       58,273,414.75 |    0.9% |      0.01 | `Pow_Shavite512_0032b`
    |                6.81 |      146,755,834.13 |    0.2% |      0.01 | `Pow_Shavite512_0080b`
    |                8.34 |      119,878,248.65 |    1.2% |      0.01 | `Pow_Shavite512_0128b`
    |                5.05 |      197,863,716.10 |    0.7% |      0.01 | `Pow_Shavite512_0512b`
    |                4.55 |      219,843,493.16 |    0.8% |      0.01 | `Pow_Shavite512_1024b`
    |                4.40 |      227,519,688.25 |    0.5% |      0.01 | `Pow_Shavite512_2048b`
    |                3.79 |      263,911,928.82 |    0.1% |      0.91 | `Pow_Shavite512_1M`
    ```

    </details>

  * <details>

    <summary>X11:</summary>

    ```
    |             ns/byte |              byte/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |              224.98 |        4,444,913.98 |    1.1% |      0.01 | `Pow_X11_0032b`
    |               90.76 |       11,017,721.52 |    1.7% |      0.01 | `Pow_X11_0080b`
    |               57.66 |       17,343,058.47 |    0.6% |      0.01 | `Pow_X11_0128b`
    |               15.79 |       63,347,591.25 |    0.3% |      0.01 | `Pow_X11_0512b`
    |                8.39 |      119,204,486.51 |    0.7% |      0.01 | `Pow_X11_1024b`
    |                4.98 |      200,909,248.89 |    1.1% |      0.01 | `Pow_X11_2048b`
    |                1.33 |      751,189,042.44 |    0.2% |      0.32 | `Pow_X11_1M`
    ```

    </details>

  ### AMD Ryzen 5 5600G, Ubuntu 24.04 (Noble)

  * **Build Information:** Ubuntu Clang 18.1.8
  * **Depends Config:** `MULTIPROCESS=1 CC=clang-18 CXX=clang++-18 HOST=x86_64-pc-linux-gnu`
  * **Build Config:** `CC=clang-18 CXX=clang++-18 CFLAGS="-O2 -g" LDFLAGS="-Wl,--as-needed -Wl,-O2" ./configure --prefix=$(pwd)/depends/x86_64-pc-linux-gnu --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccache`

  **Baseline**

  * <details>

    <summary>Echo512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |               51.81 |       19,300,083.40 |    0.2% |          916.31 |          230.13 |  3.982 |           6.19 |    0.5% |      0.01 | `Pow_Echo512_0032b`
    |               20.72 |       48,266,658.57 |    0.1% |          366.56 |           92.01 |  3.984 |           2.49 |    0.5% |      0.01 | `Pow_Echo512_0080b`
    |               25.67 |       38,953,174.42 |    0.1% |          456.97 |          113.97 |  4.009 |           2.95 |    0.5% |      0.01 | `Pow_Echo512_0128b`
    |               16.00 |       62,491,477.70 |    0.1% |          285.25 |           71.06 |  4.014 |           1.81 |    0.6% |      0.01 | `Pow_Echo512_0512b`
    |               14.38 |       69,562,745.47 |    0.1% |          256.63 |           63.83 |  4.020 |           1.63 |    0.6% |      0.01 | `Pow_Echo512_1024b`
    |               13.65 |       73,272,852.80 |    0.1% |          242.32 |           60.27 |  4.020 |           1.53 |    0.6% |      0.01 | `Pow_Echo512_2048b`
    |               12.84 |       77,877,131.11 |    0.0% |          228.02 |           56.70 |  4.022 |           1.44 |    0.6% |      3.07 | `Pow_Echo512_1M`
    ```

    </details>

  * <details>

    <summary>Shavite512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |               26.75 |       37,379,255.99 |    0.0% |          423.31 |          118.82 |  3.563 |           1.41 |    0.0% |      0.01 | `Pow_Shavite512_0032b`
    |               10.71 |       93,360,115.43 |    0.0% |          169.35 |           47.57 |  3.560 |           0.58 |    0.0% |      0.01 | `Pow_Shavite512_0080b`
    |               13.26 |       75,436,781.19 |    0.0% |          210.04 |           58.87 |  3.568 |           0.45 |    0.0% |      0.01 | `Pow_Shavite512_0128b`
    |                8.26 |      121,051,101.55 |    0.0% |          130.99 |           36.69 |  3.570 |           0.27 |    0.0% |      0.01 | `Pow_Shavite512_0512b`
    |                7.44 |      134,444,187.57 |    0.0% |          117.82 |           33.04 |  3.566 |           0.24 |    0.4% |      0.01 | `Pow_Shavite512_1024b`
    |                7.06 |      141,686,987.23 |    0.2% |          111.23 |           31.18 |  3.568 |           0.23 |    0.2% |      0.01 | `Pow_Shavite512_2048b`
    |                6.65 |      150,409,844.26 |    0.1% |          104.65 |           29.37 |  3.563 |           0.21 |    0.0% |      1.60 | `Pow_Shavite512_1M`
    ```

    </details>

  * <details>

    <summary>X11:</summary>

    ```
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |              326.13 |        3,066,305.67 |    0.0% |        5,979.73 |        1,448.47 |  4.128 |          41.00 |    0.1% |      0.01 | `Pow_X11_0032b`
    |              130.47 |        7,664,483.17 |    0.0% |        2,391.79 |          579.50 |  4.127 |          16.38 |    0.1% |      0.01 | `Pow_X11_0080b`
    |               83.00 |       12,048,837.15 |    0.0% |        1,516.78 |          368.62 |  4.115 |          10.30 |    0.1% |      0.01 | `Pow_X11_0128b`
    |               21.78 |       45,920,017.94 |    0.0% |          395.29 |           96.72 |  4.087 |           2.64 |    0.1% |      0.01 | `Pow_X11_0512b`
    |               11.50 |       86,977,849.93 |    0.0% |          208.38 |           51.07 |  4.081 |           1.36 |    0.1% |      0.01 | `Pow_X11_1024b`
    |                6.38 |      156,826,545.04 |    0.0% |          114.92 |           28.16 |  4.081 |           0.72 |    0.1% |      0.01 | `Pow_X11_2048b`
    |                1.21 |      827,082,388.48 |    0.1% |           21.65 |            5.34 |  4.052 |           0.09 |    0.0% |      0.29 | `Pow_X11_1M`
    ```

    </details>

  **Optimized**

  * <details>

    <summary>Echo512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |               17.52 |       57,085,952.31 |    0.0% |          166.19 |           77.77 |  2.137 |           7.88 |    0.0% |      0.01 | `Pow_Echo512_0032b`
    |                7.01 |      142,732,714.40 |    0.1% |           66.51 |           31.10 |  2.139 |           3.16 |    0.0% |      0.01 | `Pow_Echo512_0080b`
    |                8.69 |      115,084,427.41 |    0.1% |           81.58 |           38.56 |  2.116 |           3.76 |    0.0% |      0.01 | `Pow_Echo512_0128b`
    |                5.36 |      186,499,101.72 |    0.1% |           50.51 |           23.79 |  2.123 |           2.31 |    0.1% |      0.01 | `Pow_Echo512_0512b`
    |                4.79 |      208,614,198.78 |    0.1% |           45.33 |           21.29 |  2.129 |           2.07 |    0.0% |      0.01 | `Pow_Echo512_1024b`
    |                4.51 |      221,584,256.28 |    0.0% |           42.74 |           20.04 |  2.133 |           1.95 |    0.0% |      0.01 | `Pow_Echo512_2048b`
    |                4.27 |      234,154,680.31 |    0.0% |           40.15 |           18.95 |  2.119 |           1.83 |    0.0% |      1.02 | `Pow_Echo512_1M`
    ```

    </details>

  * <details>

    <summary>Shavite512:</summary>

    ```
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |               10.12 |       98,853,454.39 |    0.5% |           69.47 |           44.93 |  1.546 |           8.38 |    0.0% |      0.01 | `Pow_Shavite512_0032b`
    |                3.98 |      251,044,756.51 |    0.1% |           27.81 |           17.69 |  1.572 |           3.36 |    0.0% |      0.01 | `Pow_Shavite512_0080b`
    |                4.89 |      204,696,173.21 |    0.1% |           32.78 |           21.69 |  1.511 |           3.90 |    0.0% |      0.01 | `Pow_Shavite512_0128b`
    |                3.01 |      332,315,588.24 |    0.1% |           20.08 |           13.37 |  1.502 |           2.42 |    0.1% |      0.01 | `Pow_Shavite512_0512b`
    |                2.70 |      369,999,483.82 |    0.2% |           17.96 |           12.00 |  1.496 |           2.17 |    0.0% |      0.01 | `Pow_Shavite512_1024b`
    |                2.55 |      392,427,831.38 |    0.0% |           16.90 |           11.32 |  1.493 |           2.05 |    0.0% |      0.01 | `Pow_Shavite512_2048b`
    |                2.39 |      418,541,658.56 |    0.0% |           15.84 |           10.60 |  1.494 |           1.92 |    0.0% |      0.57 | `Pow_Shavite512_1M`
    ```

    </details>

  * <details>

    <summary>X11:</summary>

    ```
    |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |              271.74 |        3,679,981.60 |    0.1% |        4,875.78 |        1,206.89 |  4.040 |          49.66 |    0.0% |      0.01 | `Pow_X11_0032b`
    |              108.70 |        9,199,814.12 |    0.2% |        1,950.20 |          482.28 |  4.044 |          19.84 |    0.0% |      0.01 | `Pow_X11_0080b`
    |               69.40 |       14,408,573.10 |    0.2% |        1,240.79 |          308.15 |  4.027 |          12.47 |    0.0% |      0.01 | `Pow_X11_0128b`
    |               18.33 |       54,546,702.44 |    0.1% |          326.29 |           81.43 |  4.007 |           3.18 |    0.0% |      0.01 | `Pow_X11_0512b`
    |                9.78 |      102,262,967.62 |    0.2% |          173.88 |           43.39 |  4.007 |           1.63 |    0.0% |      0.01 | `Pow_X11_1024b`
    |                5.48 |      182,512,076.76 |    0.1% |           97.67 |           24.33 |  4.015 |           0.86 |    0.0% |      0.01 | `Pow_X11_2048b`
    |                1.20 |      835,292,126.52 |    0.0% |           21.62 |            5.31 |  4.068 |           0.09 |    0.0% |      0.29 | `Pow_X11_1M`
    ```

    </details>

  ## Breaking Changes

  Platforms that partially support the set of extensions for x86_64 (SSSE3, SSE4.1 and AES-NI) or ARMv7/ARMv8 (NEON and crypto extensions for AES) may experience different performance characteristics.

  The following cases may experience performance degradation due to moving to a small-footprint variant and the overhead (and lost optimization opportunity) incurred by runtime dispatching.

  * Platforms other than x86_64 or ARMv7 and above (there are no backends that implement optimized routines for them)
  * x86_64 or ARMv7/ARMv8, without extensions (i.e. older chips, OS/Hypervisor-level disablement of extensions)
  * Operating systems other than Windows, macOS, Linux and FreeBSD (there are no dispatch routines for other platforms)
  * Using `libdashconsensus` (optimizations are disabled wholesale for the library following upstream behavior)

  ## Checklist

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)**
  - [x] I have made corresponding changes to the documentation **(note: N/A)**
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  UdjinM6:
    utACK/light-ACK 005967b

Tree-SHA512: 8b23f8e4e591faa4fbdb75ef85a64fed5f0429c804e650e8389789468d7e594998b11228794d650adfaeaf65085d157f98533983a7893d8db5ba587341c86e44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants